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(57) Abstract . , 

Resource requests made by clients of origin servers in a network are intercepted by reflector mechanisms and selectively reflected to 
other servers called repeaters. The reflectors select a best repeater from a set of possible repeaters and redirect the client to the selected 
best repeater. The client then makes the request of the selected best repeater. The resource is possibly rewritten to replace at least some of 
the resource identifiers contained therein with modified resource identifiers designating the repeater instead of the origin server. 
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Optimized NETWORK RESOURCE LOCATION 

1. Field of the Invention 

This invention relates to replication of resources in computer networks. 

2. Background of the Invention 

The advent of global computer networks, such as the Internet, have led to 
entirely new and different ways to obtain information. A user of the Internet can now 
access information from anywhere in the world, with no regard for the actual location of 
either the user or the information. A user can obtain information simply by knowing a 
network address for the information and providing diat address to an appropriate 
application program such as a network browser. 

The rapid growth in popularity of the Internet has imposed a heavy traffic 
burden on the entire network. Solutions to problems of demand (e.g., better 
accessibiHty and faster communication links) only increase the strain on die supply. 
Internet Web sites (referred to here as "publishers") must handle ever-increasing 
bandwidth n'eeds, accommodate dynamic changes in load,' ifid improve performance- for 
distant browsing^lients! especially' diose overseas. The adoption of content-rich - 
applications, such as Uve audio and video, has farthef exacerbated the problem. 

To address basic baridwidth growth needs, a Web pubUsher typically subscribes 
.^o additional bandwidth from an Internet service provic^er (ISP), whether in the forin of 
larger or additional -pipes" or channels from the ISP to'.the pubUsher's premises, oj in 
the form of large bandwidth commitments in an ISP's remote hosting server collection. 
These increments are not always as fine-grained as the pubUsher needs, and quite often 
lead times can cause the publisher's Web site capacity to lag behind demand. 

To address more serious bandwidth growth problems, .publishersTmay develoj> 
' more complex.an,a co,sdy custom solutions; The solution to the most common need,, 
increasing capacity, is 'generky based on replication of hardware resources and site 
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content (known as mirroring), and dupHcation of bandwidth resources. These soludons, 
- however, are difficult and expensive. to deploy, and operate. As a result, only die largest 
pubUshers can ifford diem, since only diose pubUshers can amordze die costs over 

many customers (and Web site hits). 

A number of solutions have been developed to advance replication and 

mirroring. Iri general, these technologies are designed for use by a single Web site and 
do not includ^i features- that allow dieir. components to be shared by many Web sites 

simultaneously. . ' .- , 

Some solcxtion mechanisms offer replication software diat helps keep mirrored 
servers up-to-date. These mechanisms generally operate by making a complete copy of a 
file system. One such system operates by transpafendy keeping multiple copies of a file 
system in synch. Another system provides mechanisms for expUcidy and regularly 
' • cbpying files that hav£ changed. Database systems are particularly difficult to repUcate, 
■ as tfiey are continually changing. Several mechanisms aUow for replication of databases, 

although there ate no standard approaches for accomplishing it. Several companies 
' bfferfng proxy caches, describe .them as repUcation tools. However, proxy caches differ 
because they are operated on behalf of clients rather than publisher^. 

Once a Web site is seryed by multiple servers, a challenge is to ensure that die 
load is appropriately distributed or balanced among diose servrers. Domain name-server- 
based round-robin address resolution causes different clients to be directed to different 

mirrors. ■ 

Anotiier solution, load balancing, takes into account the load at each server 
(measured in a variety of ways) to select which server should handle a particular request. 

Load balancers use a variety of techniques to route die request to die appropriate 
server. Most of those load-balancing techniques require diat each server be an exact 
• replica of the primary Web site. Load balancers do not take into account die "network 
distance" between the client and candidate; mirror servers. 
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Assuming that client protocols cannot easily- change, there arc two major 
problems in the deployrtient of repUcated rdsources. The first is how tp select which 
copy of the resource to use. That is, wheh^a request for^a resource is ^5nade to a single 
server, how should the choice of a replica of the server (or of dia^data) be made. We 
call this problem th4 ^rekdeivbds problem". There are a. number of ways to get cHents 
to rendezvous at distant mirror servers.rThese technologies, like load l^alancers, must 
route a requ;st to ar. appropriate serVet, but unlike load.b^ancers. d,ey take network 
performance and topology into account in making the determinarioii. , 
- A nuiiiber of compihiesoffer products .which improve network performance by 

prioridzmg and filteririg network traffic. ■ ■ - 

■ Proxy caches pibvide a way for client Aggregators to reduce network resource , 
consumption by storing copies of popular resources close to the end users. A client 
ag^egator is an IntetneVservice ptovider-or other organization that brings a large 
riutnber of cU^nts opiating browsersVtb the Internet. CUent aggregators may use proxy 
caches to redude the bandwidth required- to serve web contept to .these browsers. 
However; traditional '^rdxy cacherar. c^perated.ori behalf of ^^^^ '^^^ 

Web publishers. ' ■ ' . j ... 

' ^ ■ " Proxy caches stdre the most^ular resources from all pubUshers, which means 
' • ' they must be very large to achieve reasonable/cache efficiency. CThe efficiency of a 
cache is defined as the number of requests for resources which are already cached 
divided by the total number of requests.) - , n 

■ Proxy caches depend on cache control hints delivered with resources to 
determine when the resources should be:replaced. These hints are predictive, and are 
^ necessarily often incorrect, sb proxy caches frequently serve stale data. In many cases, 
proxy cache operators instruct their-proxy to ignore hints.in ord.er to make the cache 
more efficient, even thbugh this causes'it to more, frequendy serve stale data. 

Proxy-caches hide-the activity:of clients fron. publishers. pnce a resource is 
cached, the pubUsher has no way of knowing how often it was accessed from the cache. 
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SUMMARY OF The INVENTION 

' ■ ■ This invention provides a way for servers in a computer network to off-load 
their processing of requests for selected resources by determining a different server (a 
"repeater") to process those requests- The selection of the repeater can be made 
dynamically, based on information about possible repeaters. 

If a requested -resource contains references to other resources, some or aU of 
diese references' cMi be replaced by references to repeaters. 

Accordingly, in one aspect, this .in^en.tion is a method of processing resource 
requests in a domputer network. First a client makes a request for a particular resource 
from an origin server, the request including a resource identifier for the particular 
resource, the resource identifier sometimes including an indication of the ori^n server. 
" Requests arriving ai the origin server do not always include an indication of the origin 
' server; since they are sent to the ori^n server, they do not need to name it. A 
mechanism referred to as a reflector, co-located with the origin server, intercepts the 
request from the^ clierit to fhe origitx server and decides whether to reflect the request or 
■ to handle it locally: If ;the reflfector decides to handle the request locally, it forwards it to 
"die ori^n server, odierwise it selefets a "best" repeater to prpcess.the request. If the 
' request is reflected, the client is-provided with a modified resource idenrifier designating 
the repeater. 

The client gets the 'modified resource ideririfier from die reflector and makes a 
request for the particular resource from the repeater designated in the modified resource 

idenrifier. ' ' " .• .- . 

■ When the repeater gets the client's request, it responds by. returning the 
requested resource to the cUent. If the repeater has a local copy of the resource then it 
returns that copy, otherwise it forwards the request to die origin server to get the 
resource, and saves a Ibcal copy 6f the resource in order to serve subsequent requests. 

The selection by the reflector of an appropriate repeater to handle die request 
can be done in a number of ways. In die preferred embodiment, it is done by first pre- 
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partitioning the network into "cost groups" aAd then determimng which cost group the 
client is in. 'Next, from a plurality of repeaters in the.n.twork, a. set of repeaters .s 
selected the meri^feers of th.. set having a low cost rdatiye to d.e cost group wh:ch the 
cliei^t is in. In order to determine the lowest cost, a ,,ble is maintained and regularly 
updated to defuse the cost between,each group and each repe.ter. Then one member of 
thesetissdected,preferablyTandondy..as,thebes.trepeater. , 

If the particular requested resource itself can contain identifiers of od.er 
• .esources/then the resour^e may be rewritten (before 

particular, the resource i. reSvritten to replace at least some of the resource idenafiers 
' • contained therein with modified reso^rc. identifiers designating a repeater mstead of the 
ori^n server. As a consequence of this, rewriting process, when the client requests other 
' resources' based on identifiers in.the pabular requested resource, the cUent will make 
^osc requests direcdy to the selected repeater, bypassxng the reflector and or,g>n server 

entirely. ' ■' ' • : ^ ^ 'Vi-n- : . : - 

Resource rewriting must:be:-perfQrmed,b,y pfle^tors. .It may also be performed 
by Wpeaters;in d.e siti^ation^where repeaters "peer". with one.another and mak. cop.es 
■■ ■ %{r^c^s which include rewritterv resource identifiers that designate a repeater. 
- ■ ■ ' In a preferred embodiment, the networH is the Internet and * 

identifier is a uniform resource locator (URL) for designating resources on the Internet, 
■and the modified resource identifier.is a.URL designating the repeater and ind.catmg the 
■ ori^n server ^as described in step B3 ^elo>y). and the modified resource .dent^fier .s 
provided to the client using a REDIRECT message. Note, only when the reflector .s 
"reflecting" k request i. the modified resource identifier provided using a REDIRECT 

message. ' • - , , r 

In anod^er aspect, this invention.is a computer network comprising a plurahty of 

• origin servers, at lea^fsome of the origin, servers having reflectors associated therewith, 
and a plurality of 'repeaters. , . . { 
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BRIEF DESCRIPTION OF THE Drawings 

The above ahd other objects and advantages of the invention wiU be apparent 
upon consideration of the foUowing detailed description, taken in conjunction with the 
accompanying drawings, in which the reference, characters refer to like parts throughout 

and in which: t* - 

' ' FIGURE 1 depicts a portion of a network environinqnt according to the present 

invention; and * • • • 

FIGURES 2-6 We flow charts of the operation otdne present invention. 



Detailed Description of the 

PRESEmX-Y PREFERIUED EXEMPIJVRY E 
Overview 

FIGURE 1 shows a portion of a network environment 100 according to the 
' present invention, wherd^i a mechahisin (reflector 108, described in detail below) at a 
server (herein ori^ server 102) m^dntains a^id keeps track of a. nunjber of partially 
"repUcated servers or r^peatki^s 104^, ld4b, and 104c . Each repeater. 104a, 104b. and 104c 
replicates some or all of the ihformation available on the origin- server 102 as well as 
infonnation avanable on otiier origih servdrS in the network 100. Reflector 108 is 
connected to a particular repeater ioiown as its "contact" repeater ("Repeater B" 104b in 
die system depicted in FlGURE 1).' Preferably each reflector maintains a connection with 
a single repeater known as its contact, and each repeater maintains a connection with a 
. special repeaUr known as its master repeater (e.g., repeater 104m for repeaters 104a, 
104b and 104c in Figure 1). 

Thus, a repeater can be considered as a dedicated proxy server that maintains a 
partial or sparse mirror of the origin server 102, by implementing a distributed coherent 
cache of the origin server. A repeater may maintain a (partial) .mirror of more tiian one 
' origin serv^er. In some embodiments, the network 100 is thelnternet and repeaters 
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mirror selected resources "pr6;Hdect' by origin servers in response to cUents' HTTP 
(hypertext transfer pr6t5e©l) and FTP (file transfer protocol) requests. 

A cUent 106 connects, via the net?vork 100, to ori^n server 102 and possibly to 
one or more repeaters 104a etc. - . . . , 

. Origin server 102 is a server at which resources originate. More generally, the 
origin server 102 is any process or collection of prp.ces.e, jh^t provide resources in 
response to requests from a client 106. Ori^n server 102 <:an be any off-d.e-shelf Web 
server. Iri a pteferred embodixnent, origin server 102. is typ.ica|ly a Web server such as 
the Apache server or Netscape Communications Corporation's Enterprise™ server. 

Client 106 is a processor requesting resources from origin server 102 on behalf of 
►an end user... The clientVofi is tyj^cally i,uscr. agent (eg., a Web browser such as . 
Netscape Communications Corporation's Navigator™) or a proxy for a user agent: 
Components other than the reflector 108 and the repeaters 104a, 104b, etc., may be 
itnplemented using commonly available software programs. In particular, this invention 
- works with . any HTTP.cUept (e.&, a Web browser), proxy cache, and Web server:. In 
• ^additionv . the reflector 108 rrxig^ be fully, integrated' into the^data sefver 112 (for^mstance, 
■ ^■'■' in a Web Server). ^.These.components might be loosely integt^ated based on the use of , 
■ ^ extension mechanisms (such as, so-caUed add-^n modules) 6^ tighdy integrated by 
: modifying the service component speci$cally to support the rep^ters. 

. Resources ori^ating at the origin server 102 may be^ static or dynamic. That is, 
. : th.: resources may be fixed or theymay be'created by the oil^ ^^etver 102 specifically in 
• ■ response to a request.. Note that k.e ^erms "s^tic" -dj^dynariuc:'' are relative, sirxce a 
static resource may change at some regular, albeit loni'interval'. 

Resource requests from the cUent 106 to the origin sen^er 102 are intercepted by 
reflector 108 which for a given request dtf.er forwards 'th^e Request on to the origin server 
102 or condirionally.reflects it to some repeater 104a, i64b, etc. in the network 100. 
. • That is, depending on th., nature of d.e request by ihexUent 106 to the origin server 102, 
: • . the reflector. 108 either serves the request locaUy^at the origin server 102), or selects one 
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of the Vepeaters (preferably the best repeater for the job) and reflects the request to the 
selected repeater. In other words, the reflector 108 causes requests for resources from 
origin server 102, made by client 106, to be eidier served locally by 'die origin server 102 
or transparendy reflected to the best repeater 104a, 104b, etc. The notion of a best 
5 • ■■ repeater and the manner in which the best repeater is selected are desdribed in detail 

below. ' • r 

•Repeaters 104a, 104b, etc. are intermediate processors used to service cUent 
requests thereby improving. per formaiice and reducing costs in dne manner described 
herein. Within repeaters 104a, 104b, etc., are any processes or collections of processes 
10 • that deliver resources to the client 106 on behalf of the origin server 102. A repeater 

may indude a repeater cache 110, used to avoid unnecessary transactions with the origin 

server 102. • , 

The reflector 108 is a rnechanism, preferably a software progi^am, thzt intercepts 
' '■ ■■■ requests that would nonnally be sent direcdy to the origin server 102. While shown in 
,5 the drawings as scparate .coxnppnent^, the reflector 108 and the origin server 102 are 

typieally to-located, e.g., on a particular system such as data server 112. (As discussed 
below, the reflector 108 may even be a "plug in" module that becomes part of the origin 

' server 102. ■ ' . . .;t - . 

FIGURE 1 shows only a.pa.rt of a network 100 according to this invention. A 
20 ■ complete operating network consists of any number of clients, repeaters, reflectors, and 
■ ' ori^n servers. Reflectors communi.9ate.wid1 die repeater network, and repeaters in the 
network communicate with one another. 

Uniform Resource Locators 

, Each location in a computer network has an address which can generally be 
25 ' specified. as a series of names or numbers. In order to access information, an address for 
• . that info.rmation myst be known. For example, on die Worid Wide Web ("the Web") 

which is a subscc of Ae Internet, die manner in which information address locations are 
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provided has been standardized into Unifonn Resou.ce.I.>cato.s (URl.). URLs speofy 
the location of resources Onformation, data files, etc:) bn^he network. . 

The notion of UrL becomes even more useful When hypertext documents are 
used A hypertext document is one which includes, ^thin the.doQument itself, hnks 
' pointers or references) to the document itself or to other documents. For example, m 
an on-line legal research system, each case may be presented as a hype^ext document. 
When other cases are cited, links to those ca^ei cab be provided. In this way, when a 
■ person^ reading a case, they can foUow cite ilinks to read theappropriate parts of oted 

C3.SCS 

■ 'inthecaseofthelr^temetikgerieralandtheWorldWideW^^ 
^ documents can be created using a standardized for*n known as the Hypertext Markup 
Language (HTML). In HTML, a document consists of data-(text. images, sounds, arxd 
' the like): including^links to other .ectrdns of the same document or to other docurnents. 
The Unks are -^nerally provided as URLs; and can>be in reladve or absolute fom.. 
Relative URLs' simply omit the paks 61 d.'e W ^hich, ar^Ae same as for the ^ 
document including tke link, such a. th. address of thedo^tpent.(when linkujp to 
samedocument),etc.H^-gen^ral,abrdwset^ 

using the corresponding parts from the current document, thereby.forming a fully 

formed URL' including a folly qualified domain name, etc- i 
• A hypertext document fi^ay contain any number ofilinks to other doc^^^ 

and e;ch of those other documents may be on a different^server in a different part of the 
world For example, a document rriay contain Unks to documentsin Russia. Afaca, 
Ghina and Australia. A user yiewing.that document at a particular cUent can follow any 
of d.e links transparentiy O-e.', without knowing where the document being Unkcd to 
actually resides). Accordingly, the cost (in terms of time or money or resource 
" aUocation) of foUowing one link versus another may be quite significant. - 

URLs generally have the foUowing form (defined in detail in T. Berners-I^e et al. 
Unifir^ Resource U>caton(UV^, J^etwork Working Group. Request for Comments: 1738, 
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Category: Standards Track, December. 1994, located at 

"httpt//ds.internic.net/rfc/rfcl738,txt", which is hereby incorporated herein by 

reference):. ... i . • • 

scheme:/ / host[:port]/urlrpath 
5 ' where "scheme"' can be a. symbol such as "file" (for a file on die local system), "ftp " (for a 
file on an anonymous FTP file server), "http" {iot a file on a file on a Web server).and 
"teltitf (for a connection to a Tplriet-based service). Other schemes, can also be used 

• and riew schemes are added every now and dien. The port number is optional, the 
system substituting a default port number (depending on the scheme) if none is 

10 provided. The "host" field maps to a particular network address for a particular 

computer. The "url^path" is relative to the computer specified in the "host" field. A 
url-path is typically, but not necessarily, the pathname of a file in a web server directory. 
= ■ . — . :For example, the foUo>j^g is a UIO. identifying a me "F' in die path "A/B/^^^ 

• • '■■ on i computer At" ivwjv.uspto.goi/': 
I's " • httpil lw»nv.uspto.govl-Al3ICl¥ 

• . . - .In order to access the file «F" (the resource) specified by the above URL, a 

program (e.g, a browser) running on a user's computer O-c, a cUent computer) would 
have to first locate the computer O-c, a server computer) specified by die host name. 
I.e=, the program would haye to.locate the server "min».uspto.gor . To do diis, it would 
access a Domain Name Servei; (DNS), providing die DNS witii die host name 
(:'umm>.uspto.goijy The DNS acts as a kind of centralized directory for resolving 
addresses from names. If the DNS determines diat there is a (remote server) computer 
• corresponding to the name "nnvm.uspto.goif\ it will provide die program with an actual 
computer network address for diat server computer. On die Internet this is caUed an 
Internet Protocol (or IP) address and it has the form "123.345.456.678". The program 
on the user's (cUerit). computer would dien use die actual address to access die remote 
(server) computer. 
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The program opens a connection to W HTTP server.(We^ 
' remote computer "um^.u^to'gov" and uses the connectiori to send a-request message to 
die remote computer (using die HTTP scheme). The message. is -typically an HTTP 
GET request which includes die url-path of die requested resource, "A/B/C/F". The 
HTTP server receives the request and uses'i't to access^ die resouree specified by die uri- 
padi "A/B /C/F". Ttie server returns die resfd'urce ovdr die samerconnecdon. 

Thus; conveii^onky HTlT cUeht request for Web resources at an origin server 
■ 102 are processed as foUows (i^e 'ticURE 2) (This is a desciipdon of the process when 

no reflector 108 is installed.): ' - ,: ■ . • 

Al^ ' A browser (6;g.'; Netscape's Navigator) at die client receives a resource 

identifier Ci.e., a URL)' from a user/ 

■ ■ ; ,. ,■ • . -J.., . 

■ A2. ■ The browser exti^acts^ the'host (origin server) fiame from die resource. 

identifier, and uses a doiiiiin naiiie server (DNS) tp look up die network 
QF) address of ihe cd^esiiohalhg server; /.The browser also extracts ^ 

' "' "■• - ' * port'nuriiberJifonei'^Vresen^^^^^ 

port number for http'r^uests is 80)^^ .r.: - 

' A3. The browser uses th^ server's network address and port number to 
' ' establish a ci)nnection between the cUent 106 and the host or origin 
server 102, * V 

' * 'A4. The cUent 106 then sends a (GET) request over connection 
identifying the requested resource ^ • : ' . - . : 

A5. The ori^n server 102'i:eceives die reiquest and, : 

A6. locates or composes the corresponding resource. 
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A7. The ori^n server 102 then sends back to the client 106 a reply containing 
' therequestedresourcc(or some form of error indicator if the resource is 

unavailable).. The, reply is sent to the cUent over the same connection as 
that on which the request was received from the client. 

A8. - The client 106 reqeives .die .reply from die origin server 102. 

There are many variations of this b^sic .model. For example, in one variation, 
instead of providing the client with the resource, the origin server can teU the dient to 
re-request the resource:by. another.name. To do so, in A7 the server 102 sends back to 
the client 106 a reply caUed a "REDIRECT" which contains a new URL indicating the 
other name. The client 106 then repeats the entire sequence, normaUy widiout any 
iritirvention, this time requesting the resource identified by the new URL. 



user 



' = Systeirt Operation •• . . 

' In this inventiori reflector 108 dffecavely takes die place of an ordinary Web 
server or ori^n scivcz 102. The reflector 108 does this by taking over the ori^n server's 
IP address ai^d port numttr. In this way, when a cUent tries to connect to the origin 
server 102, it wiU actuaU^^ corihect to did reflector 108. The original Web server (or 
origin server 102) must dieri accept requests at a different network (IP) address, or at die 
same IP address but on a different port number. Thus, using diis invention, die server 
referred to in A3-A7 above is actually a reflector 108. 

■ ' Note diat it is"also possible to leave die origin server's network address as it is 
and to let die reflector rdri at a different address or on a different port. In djis way the 
reflector does hot intercept requests bent to die ori^n server, but can stiU be sent 
requests addressed specifically to die reflector.- Thus die system can be tested and 
configured without interrupting its notmal operation. 
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The reflector 108 supports the processing as foUows (see FIGURE 3): 
upoii receipt of a request, ■ ■ '■■ c ■ 

Bl The reflector 108 ariilyzes the request to. determine whether or not to 
' ■ reflect the request/ To do this, fitst die reflector determines whether die 
sender (cUent 106) is a browser or a repeater. Requests issued by., 
repeaters must be served locaUy by die origin server 102. This 

the sender in a list of known repeater network (IP) addresses. 
■ Altemadvdiy. diis detetinihation couki be made by attaching information 
to a request tb indicate diat the request is from, a specific repeater, OT 
' repeaters can reque« resources from a spedal,p.orl;,odierdian die one 

used for ordinary clients. - ' , . ■ • .t . 4,, 

■'" " ■ ' ■ - ■'• ■ . •" ■ ' •' '. .J- 

" B2 ' If ttie 'iequtst-is.not-fto^^a repeater.itbc. reflector IfHjk? up die rcqg^ted 
resource in a table (called the "rule base") to determine whedier the 
resource requested is "repeatable'.'. 'B'ased^^i-diis determination, die 
■-. r-ui T .reflcctot«)*^:«^ccts.|h(? reque^t^(B?, described below) or serves die 
request locally ^4, describedjbelow). 

.- : The ,rule base is a Ust of regular expressipns and associated 
. attributes.. (Regvdar.expressi9ns are well-known in die field of computer 
science. -. Ajsmall bibliog^iphy of dieir ^se is foun^ in Aho, et al, 
, "Compners,Prinaplcs, technique and tools'\Aadison-W^ 

pp. 157-158.) The resource identifier (URL) for a given request is looked 
.up4n die rule.base by matching it sequentially widi each regular 

expression, The fij;st match identifies the attributes for die resource, 
,. namely repeatable or local. If there is no match iri die mle base, a default 
: attribute U. used.. Each reflectpr.has its pwn rule base, which is manually 

configured by.the reflector pjjerator. 
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B3. To reflect a request, (to serve a request locally go to B4), 
as shown in FIGURE 4, the reflector determines (B3-1) the best repeater 
to reflect the request to, as described in detail below. The reflector then 
creates (B3-2) a new resource identifler (URL) (using the requested URL 
and the best repeater) that identifles the same resource at the selected 
repeater. 

It is necessary that die reflection step create a single URL 
containing the URL of the original resource, as weU as the idendty of the 
selected repeater. A special form of URL is created to provide this 
information. This is done by creating a new URL as follows: 



Dl . Given a repeater name, scheme^ origin server name and path, create a 
new URL. If the scheme is "http'\ the preferred embodiment uses the 
15 following format: 

http: //< repeatef> / <server> / <path > 
If the form used is other than "http", die preferred embodiment uses the 
following format: 
ht^p:/ 1 <r€peater>/<s€rver>@pro>y-<s€h€me>@/ <path> 
20 The reflector can also attach a MIME type to the request, to cause the 

repeater to provide that MIME type with the result. This is useful 
because many protocols (subK as FTP) do not provide a way to attach a 
MIME type to a resource. The format is 
htip:/ 1 <repeater> / <server>@proxy- <scheme>:<typ€>@/ <path> 
25 This URL is interpreted when received by the repeater. 
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The reflector then sends (B3-3) a REDIRECT reply containing 
this new URL to the requesting client. The HTTP REDIRECT 
. command allows the reflector to send the browser a single URL to retry 
the request 

To serve a request locally, the request is sent by the reflector to 
("forwarded to") the origin server 102.' In this mode, the reflector acts as 
a reverse proxy server, The origin server 102 processes the request in the 
normal manner (A5-A7). The reflector then obtains the origin server's 
reply to the request which it inspects to determine if the requested 
resource is an HTML document, i.e., whedier the requested resource is 
one which itself contains resource identifiers. 

5. If the resource is an HTML document dien the reflector rewrites the 

HTML document by modifying resource identifiers (URLs) within it, as 
described below. The resource, possibly as modified by rewriting.^ then 
returned in a reply to the requesting client 106. 

If the requesting client is a repeater, the reflector may temporarily 
disable any cache-control modifiers which the origin server attached to 
the reply. These disabled cache-control modifiers are later re-enabled 
when the content is served from the repeater. This mechanism makes it 
possible for the ori^ server to prevent resources firom being cached at 
normal proxy caches, without affecting the behavior of die cache at the 
repeater. 

B6. Whether the request is reflected or handled locally, details about the 
transaction, such as the current time, the address of the requester, the 
URL requested, and the type of response generated, are written by the 
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, reflector to a local log file. 

By using a rule base (B2), it is possible to selectively reflect resources. There are 
a number of reasons that certain particular resources cannot be effectively repeated (and 
therefore should not be reflected), for instance: 

* i . t . . . . ■ ' ■ 

5 - the resource is composed uniquely for each request; 

the resource relies on a so-called cookie (browsers will not send cookies 
J - i r , . tp repeaters with different domain nsimes); 

. • _ . the resource is actually a program (such as a Java applet) that will run on 
t the client and that wishes to connect to a service Qava requires that the 

-,0 . - _ , service be running on the same machine that provided the applet). 

In addidon, the reflector 108 can be configured so that requests from certain 
- network addresses (^g., requests from clients on die same local area network as the 

reflector itself) are never reflected. Also, the reflector may choose not to reflect requests 
because the reflector is exceeding its committed aggregate information rate, as described 

15 \ below. -j,,^.. ^^^ , ^ _ 

A request which is reflected is automatically mirrored at die repeater when the 

■i *• 

repeater receives and processes the request. 

The combination of the reflection process described here and die caching 
- . pro.cess described below effectively .creates a system in which repeatable resources are 
20 migrated to and mirrored at die selected reflector, while non-repeatable resources are 

:/ not mirxpced. -.^ ..... 

Alternate Approach 

' 'Placing the origin server .name in the reflected URL is generally a good strateg>^ 
but it m'ay be cohskJered undesirable for aesthetic pi (in the case, e.g., of cookies) certain 

25 technical reisons. 

It is possible to avoid the need for placing both the repeater name and the server 
name in the URL. Instead, a "family" of names may be created for a given origin server, 
' : ■ each narpe identifying. ope of the repeaters used by diat server. 
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For instance, ifwww.examplc.com is the origin server, names for three repeaters 
might be created: 

wrl.example.com 
wr2.example.com 
wr3.example.com 

The name "wtl .examplcxom" wbuld be alia^ for repeater 1 , which might also 
' ' be imown by other nai^,^';uch'W««aAothfetB«iinple.com'' and "wrl.example.edu". 

If the repeater car^ determine bV which riWit was addressed, it can use this 

. information (along with a take ihat' assb'ciates repeater alias names with origin server 
' ' names^ to determine which origin server i^" being addressed. For instance, if repeater 1 is 
■ [ addressed as wrl.exampie.coin. then the bri^^6rv^r is "www^exarnple.com"; if it i?. 

addressed as "i^l '.aAo AerExample.com" 

"www.anotherExample.com". 

The repeater can use two mechanisms to determine by which- alias it is . .- 

addressed: ' <tf^- 

.1. EachaUascanfeeassociated-withadifferentlPad^^^ Unfortunately. 

■ this solution dUs not kale w^'. 'as IP addresses 

■ ■' ' the num'ber of ip' addresses required^ows-as the product of origin 

servers arid repeaters.' ' ' ■ • • 

2. The repeater can attempt to determine die alias name used by inspecting 
the "host:" tag in the rilTP Header of thfe request. Unfortunately, some 
■ ' old brov^sers still in use do not attach the "hosf." tag to a request. 

Reflectors would need to id^tify.such browsers (the browser identity is 
a part of cadh request) and avoid this form of.reflecrion. 

Hd-iv a Repeater Handles a Request : . , . , 

When a browser receives a^REDIRECT response <as produced in B3). it reissues 
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a request for the resaurce using the new resource identifier (URL) (A1-A5). Because the 
new identifier, refers, to a. repeater instead of die origin server, the browser now sends a 
request for the resource to the repeater which processes a request as foUows, with 
reference to FIGURE 5: ^ . 

. CI .• First the repeater analyzes the request to detennine the network address 
. ,.; . of the requesting dient and the path of the resource requested. Included 
in the path is an origin server name (as described above with reference to 
B3). 

C2. • The repeater uses an internal table to verify that the origin server belongs 
to a known "subscriber". A subscriber is an entity (e.g., a company) diat 
pubUshes resources (e.g., files) via one or more origin servers. When die 
entity subscribes, it is permitted to utilize die repeater network. The 
.. , , : subscriber tables described below include the information that is used to 
. Jink rc.flecto.rs to subscribers. 

If the request is not for a resource from a known subscriber, die 
.-. request is rejected. To reject a request, die repeater returns a reply 
indicating that.the requested resource does not exist. 

C3. The repeater then determines whether die requested resource is cached 
locally. If .the, requested resource is in die repeater's cache it is retrieved. 
On the.other hand, if a valid copy of die requested resource is not in die 
■ repeater's cache, the repeater modifies die incoming URL, creating a 

request that it issues direcdy to die originating reflector which processes 
it (as in B.1-B6). Because diis request to die originating reflector is from 
• - . -. a repeater, ;tbe reflector always returns the requested resource radier dian 

reflecting die request. (Recall diat reflectors always handle requests from 
repeaters locally.) If the repeater obtained die resource from the origin 
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server, the repeater then caches th^^ - 
' ' if a resource is hot cached locaUy, the cache can query its "peer 
' caches" to see if one of thetfi contains the resource, before or at the 

same time as requesting the resource froni the reflector /origin server. If 
'"\a peer cachV responds in ia 'limited period of time (preferably a 

smaU ft^c^on ot a secbhd), th^resbuice will be retrieved from the peer 
cache. 

C4. The repeater then constructs a reply including the requested resource 
' " (which was retrieved from the ciache or from the origin server) and sends 
diat reply to the requesting tli^nt. ^ 

C5. "' Details about the transaction, Wch as the associated reflector, die current 
■ rime, the address bf the requester, the URJL requested, and the type of 
response generafed' are written t6 a^ldcal log file at the repeater. 

•r. 

^■'Note tiiat die rciw of Ficte 2 refers to an origin server, or a reflector, 

i repeater, depending on wh'at the URL in step Al identifies. 



Selecting the Best Repeater . - - , , 

' ' ' ' If the" reflector 108 determines &at "it will.reflcct die request, it must then select 
the test repeater to haridle that request {as referred to in step B3-1). This selection is 
performed by the B^k~ Repeater Selector (BRS) - mechanism described here. 
■ The goal df the'BRS is tb'seleeti quickly and heuristically, an appropriate repeater 

for a '^Ven ciierit-gi\/eri only the network: address, of the client. An appropriate repeater 
is one which is riot too heavily loaded and. which is not too far from dne cUcnt in terms 
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of some measure of network distance, The mechanism used here reUes on specific, 
compact; pre^cbmputed data to make a fa^td^Fl^i^"- ^"^^^'j ^^"^'^ 
be used to select an appropriate repeater. ... . 

The BRS relies on three pre-computed tables, namely the Group Reduction 
Table, the Unk Cost Table, and the Load Table. These three tables (described below) 
• are computed off-lio^a^id downloaded to each reflector by its contact in the repeater 
network. 

V The Group Reduction Table places every network address irito a group, widi 
the goal that addresses in a group share relative costs, so that they would have d.e same 
. best repeater under varying, conditions (i-e-. the BRS is invariant over the members of 

the group). ^ , , , , . . 

The Link Cost Table is a two dimensional matrii which specifies die current 

cost between each repe.?ter and each group. Initially, the Unk cost between a repeater 
and a group, is, defined as the "normalized link cost" between thfe repeater and d.e group, 
as defined.bel:ow.,Qyprrirne,,the table will be updated with measurements which more 
accurately reflect the relative cost.of transmitting a file between the repeater arid a 
member of the group, The forrnat of the Link Cost Tab^e is <G,;oup ID> <Group 
ID> <nnk cost>, where die Group ID's are given as AS nuinbers.- 

The Load Table is a one dimensional table which identifies the current load at 

each repeat«:.,3ecause repeaters may have different capacities, the load is a value that 

represents <he.ability ,of a^ven repeater to accept additional work. Each repeater sends 
: its. current load to a central master repeater at regular intervals, preferably at least 

approximately once a minute. The master repeater broadcasts d.e tx>ad Table to each 

reflector in the network, via the contact repeater. 

A reflector is provided entries in the Load Table only for repeaters which it is 

assigned to use. The assignment'of repeaters to reflectors is perfonned centrally by a 
. repeater network operator at the master repeater. 'This assignment makes it possible to 

modify the. service level of a given reflecibr. For instance, a very active reflector may use 
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many repeaters, v^h^reas V relatively inactive' reflector may use few repeaters. 

Tables may also bei tohfigufed ^provide selective repeater service to subscribers 
in other ways, e.g.. for their clients in specific geographic ,e^ons. ,such as .Europe or 

Asia. * * ' ^ . f 

Measuring 1-oaa ^ > . ; „ : 

" In the presently preferred en^bodimehts'^'repaiteVlokdtis. measured in two 

, dimensions, namely 

' \ requests received byUc repeater per time i^^^ 

2. bytes sent by the repeater pet time ititerv^ (BSPT)... . 

For each of these dimensions, a fnaximmrv edacity setting is set. The maxinium 

. capacity indicates the point at which the repeater is considered to be fuUy loaded. A 

' higher RRFT. capacity generdly indicates a faster processor, whereas a higher BSPT 

^ .capadfl. g^craliy indicates a wider network pipe.' This forto of load measuremen^ 

assumes that a given server is dedicated to the^ task of repearittg-l .> i . : 

. , .' „ Each repeater re^larly calculates if^'^'^^^ R^ 

, .' ' \i>e numbed of requests r^ceiv^^^ sent over a short rime interval. These^ 

mlsurement^'are us^d m determine ^tbd" repeater's Idad-in feach ofthese dimensions. If 
. a repeater's, load exceeds its configured capacity, an alarm message, is sent to the repeater 

. network administxator. 

The two current load compon^Ats are combined into; a single value indicating 
. overall current load. Similar^, the two kaxitnum capacity comporients are combined 
. ■ into .a single value indicating overall m'^imum capacity. The components are combined 

as follows: 

curxent4p^d -BX current RRFT + 

current BSPT 

max.-load = B X max RRPT + ( 1 - B ) X max BSPT " ' ■ 
The factor y a value between 0 and 1 
BSPT to be adjusted, which favors consideration of eid^eV pfocessing power or 
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bandwidth. 

The overaU current load and overaU maximum capacity values are periodically 
sent from each repeater to the master repeater, where diey are aggregated in the Load 
Table, a table summarizing the overaU load for all repeaters. Changes in the Load Table 
are distributed' automatically to each reflector. 

• ' While the preferred embodiment uses a two-dimensional measure of repeater 
load, any other measure of load can be used. 



Combining Link Costs and Load 

* The BRS computes the cost of servicing a given client ^om each eligible 
repeater. The cost is.computed by combining tl?.e available capacity of dr^e candidate 
repeater v^th the cost of die liok between that repeater and die dient. The link cost is 
computed by simply looking it up in the Link Cost table. 
* - The cost is determined using the following formula: 

• threshold — K mcoc-load , - 
capacity — max( max-load - current-load, e ) 
capacity — min( capacity, threshold ) 
cost - link-cost * threshold / capacity 



In this formula, e is a very smaU number (epsUon) and X is a tuning factor initial 
set to 0.5. This formula causes the cost to a given repeater to be increased, at a rate 
defined by if its capacity falls below a configurable t^hreshold. 

Given the cost of each candidate repeater, die BRS selects aU repeaters wid:iin a 
25 " delta factor of die best score. From diis set, die result is selected at random. 

The delta factor prevents the" BRS from repeatedly selecting a single repeater 
when scores are similar. It is generally required because available information about load 
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and Unk costs loses accuracy over time. This factor is tunable. ...... 

I. Best Repeater Selector (BRS) : ^ , r 

The BRS operates as follows, with reference tp FIGURE 6: . . . 
' Given a client network address .and die Aree .tables de$cjibed above: 

El . Determine which group the client is in using the Group Reduction 
Table. 

' ' £2.' F6r each repeater' in the link Cost Table and I^ad Table, deter^^^ 

" repeatet's combined cost as follows:. 

' ' ' ^ " E2a. Determine the maximum and current load.on tjie repeater (using 

' thfe Load Table). i ; ' . .v" :.- 

' E2b. ' Deterrnine the^Unfc cost between d^e repeater and the cUem 

group (using the Link Cost Table). 
E2c. Determiiie the combined cost as described above. 

E3. Select a small set of repeaters with the lowest cost. 

E4. Select a random member from the set. 

Prbfcrably the results of the BRS .processing are maintair)ed in a local cache at 
the reflector 108. Thus, if the best repeater has recently b^en determined for a given 
client Ci.c., for a ^ven network address), that best repeater can be reused quickly without 
being re-determihed. Since die calculation described above is based on statically, pre- 
■ computed tables, if die tables have not. changed then .there is no need to re-determine 
die best repeater. ■ • • - -i • ■.. • ; . 
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. Determining the. Group Reduction and Link Cost Tables 

The Group Reduction Table and Link Cost Table used in BRS processing are 
created and regularly updated by an independent procedure referred to herein as 
NetMap. The NeMap procedure is mn by executing several phases (described below) as 
needed. 

The tenn Group is uscid here to! refers to an IP "address; group'.?. . . 
The tenn Repeat^ CUup refers" to i Group diat contains, die II? address of a 
repeater. 

The tenn Unk cost refers to a sta'tically- determined cost for transmitting data 
between two Groups. In a presendy preiFerre^ implementation, this is the minimum of 
die sums of the costs of the links along each'padi between the^. The link costs of 
primary concern here are link c<^st£\yc^ccn a Group and a Repeater Group. 
' ■ ■ ' The tenn relative link cosfiti^xo the Unk cost relative to other Unk costs for the 
same Group which is calc^ated by subtractings d.e minimum Unk cost from a Group to 
any Repeater Group from each of its Unk costs to a Repeater Group. 
The term Cost Set refers to a set of Groups Hh^ are ecjuiv^ent in regard to Best 
Repeater Selection. That is. given die infomiarion available, die same repeater 
would be selected for any of them. . . 

The NeM<^ procedure Erst prQcess.es inj>ut files to create an internal database 
called die Group Registry. These input file? describe grpups, die IP addresses widiin 
groups, and Unks between groups, and corne a variety of sources, including pubUcly 
available Internet Routing Registry (IR^) databases, BGP router tables, and probe 
services diat are located at variou,s points, around die Internet and use pubUcly available 
tools (such as "traceroute:').t:o. sample data padis. Once diis processing is complete, die 
Group Registry contairis essential information used for furdier processing, namely (1) 
. . die identity of each group, (2) die set of IP addresses in a given group, (3) die presence 
of Unjcs between groups indicating;padis over which information may travel, and (4) die 
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cost of sending data over a given link. 

The foUowing processes aVe theii pbrfb'rm^ 

i Calculate Repeater Group) link costs 

The NetM^ procedure calculates a "link cost" for transmission of data between 
5 each Repeater Group and each Group in the Group Re^stry. This^overall link cost is 

" defined as the minimiim.cost of any path between the two ^oups, where the cost of a 
■ • -pathis'dqual tothesuinofthecqsts pftbemdivM The link cost 

algorithm presented below is essenriaUy the same as algorithm #562 from ACM journal 
' Tiahsactidns on Mathematical Software: "Shortest Path From a Spea^^ 
10 OthfetKodesinaNetwork^VbyU-Pape,ACMTO^ ~' 
'hkp:/7wwv(^.netlib.org/toms/562.-, • . 
: ■ > ■ in this processing, the term. Repeater Group refers to a Group that contaifis the 
••• ■ • IP address-of a repeater. A group is.a neighbor of another ^oup if thd Group Registry 
indicates that there is.a link between.th^^^w^^^^ 

• ' ' Wreadi'tlr^get/RepeaterGrou^^ , . • ,. . . • 

' ' " '• Inidalize die Hnk cost between^T ancj itself to zero. 

Initialize the link cost between T and evexy other .Grovip to irifinity. 
\". ' Create a list L that will contain Groups diat are equidistant from the target 

20 Repeater Gr6uJ) T. = ^ - • ■ 

' ■ . 'initialize the list L'to contain just the target Repeater Gfoup T itself. 

• ' NXTiile t^e Ust L is not enapty: ' ' . , : . 

Create an empty list of neighbors of nnembers of - the list L. 
• For each Group G in the list L: - ^ ^ = ; , 

25 • For each broiipNthatYs'a neighbor of G: 

Let cost refer to the sUm- of the link cost between T and 
GVarid the Unk cbk between G and N. 
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The cost between T and G.was detennined in the 

previous pass of the algorithm; the link cost between G 

and N is from the (3roup Registry. 

If cost is less than the link cost between T and N: 

• Set the link cost between T and N to cost. 

.• Add N to L' if it is not already on it. 



Set L to L'. 



Calculate Cost Sets 

A Cost Set is a set of Groups that arc equivalent with respect to Best Repeater 
Selection. That is. given the information available, the same repeater would be selected 
for any of them. 

The "cost profile" of a Group G is defined herein as die set of costs between G 
an^ each Repeater. Two cost profiles are said to be equivalent if the values in one 
profile differ from' the corresponding values in the other profile by a constant amount. 

Once a client Group is known, the Best Repeater Selection algoridim felies on 
the cost profile for information about the Group." If two" cost profiles are equivalent, die 
BRS algorithm would select the same repeater given either profile. 

A Cost Set is then a set of groups diat have-^quivalent cost profiles. 
. The effectiveness, of this method can be seen, for example, in die case where all 
paths to a Repeater from some Grpup A pass dirou^ some odier Group B. The two 
Groups have equivalent cost profiles (and are dierefore in die same Cost Set) since 
whatever Repeater is best for Qrpup A is also going to be best for Group B, regardless 
of what path is taken between.the two Groups. 

By normaHzing cost profiles, equivalent cost profiles can be made identical. A 
normalized cost.profile is a cost profile iri which the. minimum cost has the value zero. 
A.pQrmalized cost.profile is computed by finding the minimum cost in die profile, and 
subtracting thjit;Value from ,ea,ch cpst.in the profile. 
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Cost Sets are then computed using the following algorithm: 

• For each Group G: 

• Calculate' the normalized cost profile for G 

• Look for a Cost Set with the same normalized cost profile. 

• If such as set is found, add (5 to the existing Cost Set; 

• otherwise, create a new Cost Set with the* calculated' normalized cost profile, 
containing only G. 

The algorithm for finding Cost Sets employs a hash table to reduce the time 
necessary to determine whether the desired Cost Set already exists. 'The hash table uses 
a hash value computed from cost profile of G. 

Each. Cost Set is then numbered with a unique Cost Sent Index number. ^Cost 
Sets are then used in a straightforward manner to generate' the Lank Cost Table, which 
gives, the cost from each Cost Set to each Repeater. 

As described below, the Group Reduction Table maps every IP address to one 
of these Cost Sets. 

Build IP Map 

The IP Map is a sorted list of records wHich map IP- address ranges to Link Cost 
Table keys. TTie format of the IP map is: - - i. - ^ ' \ 

<base IP address> <max IP address> <Lihk Cost Table key> 
where IP addresses are preseridy represented by 32-bit [integers. The entries are sorted by 
descending base address, and by ascendiiig 'maximum address^among equal base 
addresses, and by ascending'Link C6st table' key among ecjual base addresses and 
maximum addresses. Note that ranges' rhay overlap'. > • r - * 

The NetMap procedure* generates irite'rmediate'IP rriaf> containing a map 
between IP address ranges and Cdsf Set riumbers as follows: - 
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• For each Cost Set S: 

• For each Group G in S: 

• For each IP address range in G: 

• , Add a triple (low address, high address, Cost Set nunnber of 
, . I . ^ , S) to the IP map. 

The IP map file is then sorted by descending base address, and by ascending 
maximum address among equal base addresses, and by ascending Cost Set number 
among equal base addresses and maximum addresses. The sort order for the base 
. address and maximum address minimizes the time to build the Group Reduction Table 
and produces the proper results for overlapping entries. 

Finally, the NetMap procedure creates the Group Reduction Table by processing 
the 5Qrted IP map, The Group Reduction Table maps IP addresses (specified by ranges) 
into Cost Set numbers. Sj^ecial processing of the IP map file is required in order to 
detect overlapping address ranges, and to merge adjacent address rariges in order to 
minimize the size of the Group Reduction Table. 

An ordered list of address range segments is maintained, each segment consisting 
of a base address B and a Cost Set number N, sorted by base address B. (The 
maximum address -of a segment is the base address of the next segment minus one.) 
The following algorithm is used: 

• Initialize the Hst with the elements [-infinity, NOGROUt>], [+infinity, NOGROUP]. 

• For eachrentry in the IP map, in sotted order, consisting of (b, m, s), 

• Insert (b, m, .s) in the list (recall that IP map entries are of the form 
,(low address, high address Cost Set number of S)) 

• For each reserved LAN address.range (b, m): 
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Insert (b, m, LOCAL) in the list. 

• For each Repeater at address a: " ' ■ ' : * 

Insert (a, a, REPEATER) in the list. • ■ ^ ^ 

• For each segment S in the ordered list: ^ 

• Merge S with foUowing segmdnts widi the same Cost Set 

• Create a Group Reduction Table entry with base address from the 
base address of S, 

• max adciress = next segment s base 1 , 

• group ID = Cost Set number bfS. 

A reserved LAN address rmge is an address tahge reserved for use by LANs 
which should not appear as a global Internet address. LOCAL is a si>ecial Cost Set 
' index different from all o^ers, in^cating Wt the rarige rhapk to a cUent which sjnould 
. never be reflected. REIPEATER is a ^ ' 
; " indicating that the a^^^^ a special Cost Set 

' index different 'frorn that this knge of addresses has no known 

mapping. ... 

Given (B, M, N), insert an enttjr in tlie ordered addrfess Ust as foUows: 
Find the last segment (AB," A>j) for which AB IS leks diiri ^o^ 
If AB is less than B, insert a new segment (B, N) after (AB, AN). 
Find the last segment (YB, YN) for whith YB is l&s'than or equal to M. 
Replace by (XB, N) any segment (XB, NOGROUP) for which XB is greater 
than B and less dian YB. 
" ' If Yl^ is not i<I, and either YN is NOGROUP 6r YB is less dian or equal to B, 
' ' Let (ZB/ZN) he the segrtient foUowing (YB, YN). 
'if M+ lis less than ZB/insert a new segment (M+ 1 , YN) 
■ ->;- before (ZB, ZN)v : • :.■ i ' 
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Replace (YB, YN) by (YB, N). ' 
Rewriting HTML Resources . . 

As explained above with reference to FIGURE 3 (B5), when a reflector or 
repeater serves a resource which itself includes resource identifiers (e.g., a HTML 
' resource), that resource is modified (rewritten) to pre-reflect resource identifiers (URLs) 
of repeatable resources that appear in die resourcte; .Rewriting ensures diat when a 
browser requests repeatafele resources identified . by. die requested resource, it gets diem 
from a repeater without going back to the origin server, but when it requests non- 
repeatable resources identified by die requested reso^urce, it wiU go direcdy to die origin 
server. Without this optimization, the browser would eidier make all requests at die 
origin server Cincreasifig traffic at die origin servei: and necessitating far more 
■ " " / ' Redirections from die origin server), or it would make all requests at die repeater (causing 

the repeater to reduiidanriy reqdest and copy resources which could not be cached, 
15 ' increasing the overhead c^fsdrvihg such re^urces).. 

Rewriting requires that a repeater has been selected (as described above with 
reference to the Best Repeater Selector). Rewriting uses a so-called BASE directive. 
The BASE directive lets die HTML identify a different base server. (The base address is 
normaUy the address' bf the HTML resource.) 
20 Rewriting is performed as follows: 
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Fl. A BASE directive is added at die beginning of the HTML resource, or 
modified where liecessafy. Normally, a browser interprets relative URLs 

' as being relative to the default base address, namely, the URL of d:ie 
HTML resource (page) in which they are encountered. The BASE 
address added specifies the resource at die reflector which originaUy 

' ' served the resoUrce.- This means that unprocessed relative URLs (such as 
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those generated by Javascript^ programs), wiU be interpreted as relative 
to the reflector. Without this BASE address, browsers would combine 
relative addresses with repeater names to create URLs which were not in 
the form required by fepCaters (as described; ^bova in step Dl). 

..the rewriter ider^tifi.es,dkec;tives, such. « images and anchors. 

containing-URU. . If the rewriter ip running in a reflector, it must parse 
the HTML me to identify these directives.. 
If it is^nnming iti a reppater, the rpvriter may^ 

computed information that identifies ^e location of each URL (plaped m 
•theHIMLffleinstep.F4)-.j,^ ,,, .,, 



- For each URL encountered in the..,es9urce to be re-wri«en, the rewnter 
. , . rnustdeterininewhethcxtheURLisr^eamble Cas iri s^^^^^ 
. -the URL is not repeatabie. it i^ Aqt rnqdified. On the other hand, tf the 
URL is tepeat.ble,it is modified; tc>, refer to the sdected repeater. ^ 



F4.: After all URLs haye been identifipd and rnoctified. if the resource is being 
. served to a repeats, a table .is appended at the beginning of the re 

that identifies thejocation and contetvt of each URL ^.countered in the 
resource. (This step is,an optimization which eliminates the need for 
parsing HTML resources at the repeater.) 

^F5 Once all changes have. IpeeiJ identified, a new length is computed for the 
. : resource (page). ,The ,leng*> inserted in the HTTP header ^^^^ 

J- r serving the resource. ... . r . 

An extension,of HTML,. knpwn as is currendy being developed. The 
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process of rewriting URLs will be similar for XML, with some differences in the 
mechanism that parses" the resource and identifies embedded URLs. 

Handling Non-HTTP Protocols 

This invention makes it possible to reflect references to resources diat are served 
by protocols other than HTTP, for instance, die File Transfer Protocol (FTP) and 
audio/video stream protocols. However, many protocols do not provide the ability to 
redirect requests. It is, however, possible to redirect references before requests are 
actually made by rewriting URLs embedded in HTML pages. The following 
modifications to the above algoridims are used to support this capability. 

In F4, the rewriter rewrites URLs for servers if diose servers appear in a 
configurable table of cooperating origin server or so-called co-servers. The reflector 
operator can define this table to include FTP servers and odier servers, A rewritten 
URL that refers to a non-HTlTP resource takes the form: 

ht^:/ 1 <repeater> / <ongin server>@proxy^<scheme> [:<type>]@/ resource 
where <scheme> is a supported protocol name such as "ftp". This URL format is an 
alternative to the form shown in B3. 

In C3, the repeater looks for a protocol embedded in die arriving request. If a 
protocol is present and the requested resource is not already cached, the repeater uses 
the selected protocol instead of die default HTTP protocol to request die resource when 
serving it and storing it in the cache. 

System Configuratidn and Management 

In addition to the processing described above, the repeater network requires 
various mechanisms for system configuration and network management. Some of these 
mechanisms are described here. 
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Reflectors allow their operators to.synchronizerppeater caches by performing 
pubUshing operations. , The process of keeping repeater caches synchronized is 
described below. Publishing indicates that a resource or coUecdon of resources has 
changed. 

Repeaters and reflectors participate in'various types .of Ipg processing. The 
results of logs coUected at repeaters are coUcctc<i and n>erged >^th^^gs collected at 
ieflectors, as described b'elbw. . .- • -. . .. ; 



Adding Subscribers to the Repeater Netivork 

When a new subscriber is added to the netwo.rk, infortnadon about the 
subscriber is entered in a Subscriber Table at die master repeater and propagated to all 
repeaters in the network.. This, information includes the Committed Aggregate InforvmHon 
R^te (CAIR) for servers belonging to 'd,e subscriber, and a Ust of the repeaters diat may 
be used by servers bclonging to die syibscribe^.. , ^ . .. . 



Adding Reflectors to the Repeater Network 

, When a new reflector is added to' the network, it simply connects to and 
announces'itself to a contact repeater, preferably using a securBy encrypted certificate 
including the repeater's subscriber identifier. 

. The contact repeater determines whether die reflector network address is 

permitted for this subscriber. If it is, the contact repeater accepts die connection and 
updates die reflector widi all necessary tables (using version numbers to determine 

which tables are out of date). 

The reflector processes requests during this rime, but is not "enabled" (allowed 
to reflect requests) until all of its'tables arc-qurr^nt. . . . . 
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Keeping Repeater Caches Synchronized 

Repeater caches arc coherent, in the sense that when a change to a resource is 
identified by a reflector, aU repeater caches are notified, and accept the change in a single 
transaction. 

Only the identifier of the changed resource (and not die entire resource) is 
transmitted to the repeaters; die identifier is used to effectively invalidate the 
corresponding cached resource at the repeater. This process is far more efficient than 
broadcasting the content of the changed resource to each repeater. 

A rq>eater \wll load the newly modified resource the next rime it is requested. 
. A resource change is identified at the reflector either manually by the operator, 

or through a script when files are installed on the server, or automaticaUy dirough a 
change detection mechanism (e.g., a separate process that checks regularly for changes). 
■ A resource change causes die reflector to send an "invalidate" message to its 

contact repeater, which forwards die message to die master repeater. The invalidate 
message contains a list of resource identifiers (or regular expressions identifying patterns 
of resource identifiers) diat have changed. (Regular expressions are used to invaUdate a 
directory, or an entire server.) The repeater network uses a two-phase coinmit process to 
ensure that all repeaters correcdy invalidate a ^ven resource. 
• The invalidation process operates as follows: 

The master broadcasts a "phase 1" invaHdation request to all repeaters indicating 

. die resources and regular cxpressioiis describing sets of resources to be invalidated. 

men each repeater recehres die phase 1 message, it first places die resource 
identifiers or regular expressions into a list of resource identifiers pending invalidation. 

Any resource requested Cm C3) diat is in die pending invalidation Ust may not be 
served from die cache. This prevents die cache from requesting die resource firom a 
peer cache which may not have received an invalidation notice. Were it to request a 
resource in diis manner, it might replace die newly invalidated resource by die same, 
now stale, data. 
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The repeater then corppares the resource identifier of each resource in its cache 
against the resource identifiers and regular expressions in the list. 

Each match is invalidated by marking it stale arid optionally reriioving it from die 
cache. This means that a future request for die" resource Wl dause it to retrieve a new 
copy of the resource from the reflector. 

When the repeater has completed die invalidadoh, it returns an acknowledgment 
to die master. The master waits until all repeaters have acimowledged ffie iriVaUdadon 
request. 

If a repeater fails to acknowledge within a ^en pcriiod, it is disconnected from 
the master repeater. When it reconnects, it will be told to flush its entire cache, which 
will eliminate any consistency problem. (To avoid flushing die entire cache, die master 
could keep a log of all invalidations performed, sorted by date, and flush only files 
invalidated since die last rime die reconnecting rq>eatet successfully completed an 
invalidation. In die presentiy preferred embodiments diis is not done since it is beUeved 
that theaters will seldom disconnect.) 

When all repeaters have acioiowledgi^ mvafidadon-^Cor timed out) the rep^ter 
. broadcasts a "phase 2" invafidation request to all ircpeaters. This causes die repeaters to 
remove the corresponding resource identifiers arid regular exjpressioris from die Ust of 
resoxirce identifiers pending invalidation. 

In another embodiment, die invalidation tequest will be extended to allow a 
"server push". In such requests, after phase 2 of the invalidation process has completed, 
die repeater receiving the invalidation request immediately irequest a new copy of die 
invalidated resource to place in its cache. 



Logs and Log Processing , , 

Web server activity logs are fundamental to monitoring die activity in a Web site. 
-This invention creates' "merged l6gs".-that cbmbine die activity at reflectors witii die 
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activity at repeaters, so that a single activity log appears at the Origin server showing all 
Web resource requests made on laehalf of that site at any repeater. 

This merged log can be processed by standard processing tools, as if it had been 
generated locally. 

On a periodic basis, die master repeater (or its delegate) coUects logs from each 
repeater. The logs collected are merged, sorted by reflector identifier and timestamp, 
and stored in a dated file on a per-reflector basis. The merged log for a given reflector 
represents the activity of all repeaters on behalf of tiiat reflector: On a periodic basis, as 
configured by the reflector operator, a reflector contacts die master repeater to request 
its merged logs. It downloads diese and merges them with its locally maintained logs, 
sorting by timestamp. The result is a merged log that represents all activity on behalf of 
repeaters and the g^ven reflector. 

Activity logs are optionally extended with information important to the repeater 
network, if the reflector is configured to do so by the reflector operator. In particular, 
an "extended status code" indicates information about each request, such as: 

1 . request was served by a reflector locally; 

2. request was reflected to a repeater;* 

3. request was served by a reflector to a repeater;* 

4. request for non-repeatable resoiarce was served by repeater* 

5. request was served by a repeater from the cache; 

6. request was served by a repeater after filling cache; 

7. request pending invalidation was served by a repeater. 

(The activities marked with represent intermediate states of a request and do not 

normally appear in a final activity log.) 

In addition, activity logs contain a duration, and extended precision timestamps. 

The duration makes it possible to analyze thc'time required to serve a resource, the 
bandwidth used, die number of requests handled in parallel at a given rime, and other 
quite useftil information. The extended precision timestamp makes it possible to 
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accurately merge activity logs. , . . . 

Repeaters use the Network Time Protocol (NTP) to maintain synchronized 
clocks. Reflectors may either use NTP or calculate a time bias to provide roughly 
accurate timestamps relative to their contact repeater. 



Enforcing Comnnitted Aggregate' Information Rate ' ■ < 

, . ■ ^ . ■ ■ . . • ■ . ... . , 

. . . The repeater network moiiitors and limits the a^^ate rate at which data is 

served on behalf of a,given subscriber by all repeaters. This mechanism provides die 

follpwing benefits: 

1. provides a means of pricing repeater service; 

2. provides a means for estimating and reserving capacity at repeaters; 

,3. provides.a means for preventing dienfs of a busy site ficoni liiniring access to 
other sites. 

For each subscriber, a "direshold aggregate information rate"' (TAIR) is 
configured and maintained at the master repeater. This direshold is not necessarily die 
committed rate, it may be a multiple of committed rate, based on a pricing poUcy. 

Each repeater measures the information rate component of each reflector for 
which it sqryes resources, periodically (typically about once a minute), by recording die 
number of bytes transmitted on behalf of diat reflector each rime a request is delivered. 
The table thus created is sent to the master repeater once per period. The master 
repeater combines die tables firom each repeater, summing die measured information of 
each reflector over all repeaters that serve resources for diat reflector, to determine die 
"measured aggregate information rate" (MAIR) for each reflector.' 

If the MAIR for a given reflector is greater diah die TAIR for that reflector, the 
MAIR is transmitted by die master to aU repeaters and to die respective reflector. 

When a reflector receives a request, it determines whedier its most recendy 
calculated MAIR is greater dian its TAIR. If dtis is the case,' die rdflector 
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probabiUstically "decides whether to suppress reflection, by serving the request locally Cm 
B2). The probability of suppressing the reflection increases as an exponential funcaon 
of die difference between the MAIR arid the CAIR. 

Serving a request locally during a peak period may strain die local ori^ server, 
but it prevents this subscriber from taking more dian allocated bandwiddi from die 

shared repeater network. 

When a repeater receives a request for a ^yen subscriber Cm C2), it determines 
whether the subscriber is running near its direshold aggregate information rate. If this is 
the case, it probabilistically decides whcdier to reduce its load by redirecting die request 
back to die reflector. The probabiHty increases exponentially as die reflector's aggregate 
Information rate approaches its limit. 

If a request is reflected back to a reflector, a special character string is attached to 
die resource identifier so tiiat die receiving reflector will not attempt to reflect it again. 
In die cufrent system, this string has the form 
- . "src=overload". 

• The reflector tests for this string in B2. 

The mechanism for limiting Aggregate Ihfbririation Rate described above is 
fairly coarse. It limits at die levd of sessions widi clients (since once a dient has been 
reflected to a given rq>eater, die rewriting process tends to keep die cUent coming back 
to that repeater) and, at best, individual requests for resources. A more fine-grained 
mechanism for enforcing TAIR limits widiin repeaters operates by reducing die 
bandwiddi consumption of a busy subscriber when odier subscribeirs are competing for 
bandwiddi. 

The fine-grained mechanism is a form of data "rate shaping". It extends die 
mechanism diat copies resource data to a connection when a reply is being sent to a 
, client. When an output channel is established at the time a request is receded, die 

repeater identifies which subscriber die channel is operating for, in C2, and records die 
subscriber in a data field associated widi the channel. Each time a ".xmte" operation is 
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about to be made to the channel, the Metered Output Stream first inspects the current 
values of the MAIR and TAlR„calculated above, for the ^ven subscriber. If the MAIR 
is larger than the TAIR, then the mechanism pauses briefly before performing the write 
operation. The length of the pause, is proportional to the amount the MAIR exceeds the 
TAIR. The pause ensures that tasks .sending od^er resources to other clients, perhaps on 
behalf of other subscribers, will have an opportunity to send their data. ' ^ 

Repeater Network Resilietice 
. . .: ibc tei^nctwotk'is capable of tecoyciing v^en" a tq>catet ot network 

. connection fails. , \ ' 

A repeater cannot operate unless it is connected to the master repeater. The 
■ masterrepeaterexchangescriticalinformationwfthotherrepeatersim^^ - 
information about repeater load, aggregate information ^te, subscnbers, and hnk cost. 

If a master fails, a "succession" process ensures that an6thdr repe.*er wdl take 
over the role of master, and the network as a^hole vHll r^m^n operational. If a master 
fails or a connection to a master fails through a ndt^Wi,ibblein, any repeater . . 
, ■attel.pting to cvtpmmucate with t^ie master will cietect th. failure, either through an 
indic^tionfromTCP/Worbytimingoutfromareg^^ 

the master. , , , 

When any repeater is disconnected^ from its mast^. it irnmediately tries to 

.. reconrvect to a series of potential tnasters based oh a c6nfigurab^ 

- ' "succession list". . . . 

The repeater tries each system on^the Ust ih succession utvtil it successfully 
connects to a master. If in this process, it comes to its own hame, it takes on the role of 
master, and accepts connections from other repeaters, if a repeater which is not at the 
- top of the list becomes the master; it is caUed the "taihporai^ master": 

A network partition may cause two groups 6f repekters each to elect a ^^^^ 

When the partition is corrected; it is nece^ary that^th. more senior master take over the 
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network^ Therefore, when a repeater is temporary master, it regularly tries to reconnect 
to any master above it in the succession list. If it succeeds, it immediately disconnects 
from aU of the repeaters connected to' it. men diey retry dieir succession Usts, diey will 
connect to the more senior master repeater. ' 

To prevent losses of data, a temporary master does not accept configuration 
changes and does not process log files. In order to take on diese tasks, it must be 
informed that it is primary master by manual modification of its successor list. Each 
repeater regularly reloads its successor list to determine whetiier it should change its idea 
of who the master is. 

If a repeater is disconnected from the master, it must resynchronize its cache 
when it reconnects to the master. The master can maintain a list of recent cache 
invalidations and send to the repeater any invalidations it was not able to process while 
disconnected. If tiiis list is not available iter some reason (for instance, because die 
reflector has been disconnected too' long), the reflector must invalidate its entire cache. 

A reflector is not permitted to reflect requests unless it is connected to <j 
repeater. The reflector relies on its contact repeater for critical information, ^; as load 
and Link Cost Tables, and current aggregate information rate. A reflector tiiat h not 
connected to a repeater can continue to receive requests and handle them locally. 

If a reflector loses its connection widi a repeater, due to a repeater failure or 
petwork outage, it continues to operate while it tries to connect to a repeater. 

Each time a reflector attempts to connect to a repeater, it uses DNS to identify a 
set of candidate repeaters g^ven a domain name that represents the repeater network. 
The reflector tries each repeater in ttiis set until it m^kes a successful contact. Until a 
successfial contact is made, die reflector serves all requests locally. When a reflector 
connects to a repeater, the repeater can tell it to attempt to contact a different repeater, 
this allows the repeater network to ensure that no individual repeater has too many 
contacts. 
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men contact is made, the reflector provides the version number of each of its 
tables to its contact repeater. The repeater then decides which tables should be updated 
and sends appropriate updates-to the reflector. Once aU tables have been updated, the 
repeater notifies the reflector that it may. now start reflecting requests. 

Using a Proxy Cache within a Repeatar - - i . , 

■ . .^Repeaters are intentionaUy designed SO that 

component vdthin them. This is possible because the rep^ter receivW HTTP requests 

and converts them to a form recpgnized by the proxy <^che. 

,On the other hand, several^modifications to a smi^^ 
. may be made as optimizations. This includes, in particular, the ability to conveniently 
... invalidates resource, the ability, to support cache quous. and the ability to avoid making 
. an extra copy of each Resource as it passes from the proxy cache tough &e repeater to 

the requester. . : , v r - . _ 

Jn^ereferteiembodimen^^ The.proxy 
. ::cadbe:k jledicated for use only by one or more repeaters. Each repeater requiring>a 
./ resource from the proxy cache consttucts a proxy request ficom the 

-request. A normal HTTP GET request to a server cont^'s only the pathname part of 
. ., the URI^the.scheme and server nline are inipUdt dn an HTITGE^ 
, .repeater, the pathname part of the URL includes the name of the ori^ server ort behalf 
of whiph the request is being made, as described above.) Howeva, a proxy agent GET 
request takes an entire URL. Therefore, the repeater mu^t construct 4 proxy request 
containing.*e entire URL from.the padi portion of die URL it receives. Specifically, if 
^ : the incoming requ^est takes the form: 

GET /<oriffn server>/ <path> 
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then the repeater constructs a proxy request of the form: 

GET http:/ / <oriffn server> / <path> 

and if the inconiing request takes the form: 

. GET <origin servef>%proyy^ <scbeme>:<typ€^@/ <path> 
then the repeater constructs a proxy request of the form: 

GET <scheme>:/ /<origin server> / <path> 
Cache Control 

HTTP replies contain directives called cache control directives, which are used 
to indicate to a cache whether the attached resource may be cached and if so, when it 
should expire. A Web site adininistrator configures the Web site to atuch appropriate 
directives. Often, the administratbir vinll not know how long a page will be fresh, and 
must define a short expiration time to try to prevent caches from serving stale data. In 
many cases, a Web site operator will indicate a short expiration rime only in order to 
receive the requests (or hits) that would otherwise be masked by the presence of a cache. 
This is known in the industry ats "cache-busting". Although some cache operators may 
consider cache-busting to be impolite, advertisers who rely on this informarion may 
consider it imperarive. 

When a resource is stored in a repeater, its cache direcrives can be ignored by the 
repeater, because the repeater receives explicit invalidarion events to determine when a 
resource is stale. When a proxy cache is used as the cache at the repeater, the associated 
cache direcrives may be temporarily disabled However, they must be re-enabled when 
the resource is served from the cache to a client, in order to permit the cache-control 
policy including any cache-busring) to take effect 
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The present invenrion contains mechanisms to prevent the proxy cache >vithin a 
repeater from honoring cache control directives, while permitting the directives to be 
served from the repeater. 

When a reflector serves a resource to a repeater in B4. it replaces all cache 
directives by modified directives that axe ignored by the repeater proxy cache. It does^ 
this by prefixing a distinctive string such as "wr-" to the beginning of the HTTP tag. 
Thus "expires" becomes "wr-expires", and ;;cache-contror becomes 
"wr-Iache-control".. This prevents the proxy cache itself fi:om honoring the directives. 
When a repeater serves a resource in C4. and the requesting dient is not another 
repeater, it searches for HTTP tags beginning^th "wr-» and removes the "wr-". This 
allows the clients retrieving the resource to honor the directives. 

Resource Revalidation 
i There are several ,cases v.here a resource n>ay be cached so iori^ arthe origin 
server is consulted each W it, is. .served. In one c^e"; the request for the resource is 
■ - attached to a so-calkd^'copkie". The ori^n server must'be presented with the cookie 
. ..record-therequestand,determine^heth«thecache^^ ^ 
another case, the request. foi;.th. resource i^ att^hed to an auth«itication header (which 
identifies the requester with a user id and p^word). Each new requ^t for the resource 
. . mustbetested-atthe,pri^;S<^«toassme'thattherequesterisauthom^^ 



IS 

to 



resource. 



The HTTP 1.1 specification defines a reply header tided '-Must-Revalidate" 
which allows an ori^ server to instruct a pro^ cache to "revalidate" a resource each 
rime a request is , receh^ed. Normally. this mechanism is used to determine whether a 
resource is still fresh., In the present invention, Must-Revalidate mikes it possible to ask 
. an origin server, to. valid^^ a re^guest tiiat is od^erwise served from a repeater. 
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The reflector rule base contains information that determines which resources 
may be repeated but must be revalidated each time they are served. For, each such 
resource, in B4, the reflector attaches a Must-Revalidate header. Each time a request 
comes to a repeater for a cached resource marked with a Must-Revalidate header, the 
request is forwarded to the reflector for validation prior to serving the requested 
resource. ' 

Cache Quotas 

The cache component of a repeater is shared among those subscribers that 
reflect clients to that repeater. In order to allow subscribers fair access to storage 
facilities, the cache may be extended to s^ppo^t quotas. 

Normally, a proxy cache may be configured with a disk space threshold T. 
Whenever more than T bytes are stored in the cache, the cache attempts to find 
resources to eliminate. 

Typically a cadhe uses the least-recendy-used (LRU) algoridim to determine 
which resources to eliminate; moire sophisticated caches use other algorithms. A cache 
may also support several threshold values-^for instance, a lower threshold which, when 
reached, causes a low priority backgroxind process to remove items firpm the cache, and 
a higher threshold which, when reached, prevents resources firom being cached until 
sufficient free disk space has been reclaimed. 

If two subscribers A and B share a cache, and more resources of subscriber A 
are accessed during a period of time than resources of subscriber B, then fewer of B's 
resources will be in the cache when new requests arrive. It is possible that, due to the 
behavior of A, B's resources will never be cached when they are requested. In the 
present invention, this behavior is undesirable. To address this issue, the invention 
extends the cache at a repeater to support cache quotas. 

The cache records the amount of space used by each subscriber in Ds, and 
supports a configurable threshold Tg for each subscriber. 
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: Whenever a resource is added to the cache (at C3), the value is updated for 
the subscriber providing the, resource: ,lfE(5 is larger than T^, the cache attempts to find 
resources to eliminate, from among diose resources associated ^th:subscnber S. The 
' cache is effectively partitioned into separate areas fpr each subscriber. 
■ - The original threshold T is still supported. If the sun) of reserved s^ments for 
each subscriber is smaller than the total space reserved in the cache, the reniaining area 
is "common" and subject to competition among subscribers. 

Note, this mechanism might be implemented by modifying die existing proxy 
^ cache discussed above, or it might also be implemented without modifying die proxy 
cache— if the proxy cache at least toakeS it possible for an external prograrn to obtein a 
list of resources in the cache; and to remove a ^ven resource from the cache. 

' - Rewrilting from Repeaters 

When a repeater receives a request for a resource, i;s. pro?cy cache may be 
configured to deterniine whethcr.a peer cache contains.,tl>e requested resource. If so, 
•the proxy cache obtains, the resourcerfrom the p^^^ cache^ which can be faster dian 
obtaining it from the origin server (die reflector). However, a consequence of dils is that 
- rewritten HTML resources. retrieved from the peer cache would identify the wrong 

repeater. Thus,.to allow for coopeyrating-proxy caches,^, resources are preferably rewritten 

at the repeater. - . . . v > . . 

... When a resource is rewrittea-.for a repeater, a special tag is placed at the 
beginning of the resource. When epnstnacting a.reply, the repeater inspects die tag to 
* ' determine whether the resource indicates, that additipnal rewriting is necessary. If so, die 
' repeater modifies the resource- by replacing references to- the old repeater with references 

to the new repeater: ; . - . ' ... q 

It is only necessary to perfqmri this rewriting when a resource is served to the 
. prQxy.cache at^another repeater. . . 
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Repeater-Side Include 

Sometimes, an origin server constructs a custom resource for each request (for 
instance, when inserting an advertisement based on the history of the requesting client). 
In 3uch a . case, that resource must be served locally rather than repeated. Generally, a 
custom resource contains, along with the custom information, text and references to 
other, repeatable, resources. 

The process that assembles a "page" from a text resource and possibly one or 
more image resources is performed by the Web browser, directed by HTML. However, 
it is not possible using HTML to cause a browser to assemble a page using text or 
directives from a separate resource. Therefore, custom resources often necessarily 
contain large amounts of static text that would otherwise be repeatable. 

To resolve this potential inefficiency, repeaters recognize a special directive 
called a "repeater side include". This directive makes it possible for the repeater to 
assemble a custom resource, using a combination of repeatable and local resources. In 
this way, the static text can be made repeatable, and only the special directive need be 
served locally by the reflector. 

For example, a resource X might consist of ciistom directives selecting an 
advertising banner, followed by a large text article. To make diis resource repeatable, the 
Web site administrator must break out a second resource, Y, to select the baimer. 
Resource X is, modified to contain a repeater-side include directive identifying resource 
Y, along with the artide. Resource Y is created and contains only the custom directives 
selecting an ad banner. Now reso\irce X is repeatable, and only resource Y, which is 
relatively small, is not repeatable. 

When a repeater constructs a reply, it detennines whether the resource being 
served is an HTML resource, and if so, scans it for repeater-side include directives. 

* i 

Each such directive includes a URL, which the repeater resolves and substitutes in place 
of the directive. The entire resource must be assembled before it is served, in order to 
determine its final size, as the size is included in a reply header ahead of the resource. 
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.Thus, a method and apparatus for dynamically replicating selected resources in 
computer networks is provided. One skilled in the art will appreciate that the present 
invention can be pracdced by other than the described embodiments, which are 
presented for purposes of illustrarion and'not limitadon, and die present invention is 
limited only by the claims that follow. 

■ V-'-. ^vr: 

What is claimed: 

1. A method of processing resource requests in a computer network, the 
method comprising, 
(i) by a client: 

(A) maidng a request for a particular resource from an origin server; 
the request including a resource identifier for the particular 
resource; 

(li) by a reflector: 

(B) intercepting the request from the client to the ongin server; 

' ■ ' -' - ■ . . ■ . ."-7 'a A-..," * ' i:r ^ • . 

(C) selecting a repfeater to process the request; 

(D) providing to the client a modified resource identifier designating 
the repeater; 

^lii) by the client: 

(E) receiving the modified resource i^dentifier from the reflector, and 

(F) making a request for tiie particular resdurce from the repeater 
designated in the modified resource identifier; 

(w) by the repeater. 

(G) receiving the request from the client; and 

(H) returning the' requested resoiircd to the client. 2. A method 
as in claim 1 further compnsing, by the repeater 

(I) making a* request for the resource from the origin server; and 
(J) receiving the resource from' the origin se'i^^ 
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3. A method as in claim 1 wherein the selecting of a repeater by the 
reflector comprises: 

(CI) partitioning the network into groups; 
(C2) determining which group the client is in; 

(C3) selecting^ from a plurality of repeaters in the network, a set of repeaters 

having a lowest cost relative to the group which the client is in; and 
(C4) selecting as the repeater a meinber af the selected set of repeaters. 

4. A method! as in claim 3, wherein the cost of a repeater is a value based on 
that repeater's current load and a maximum load for that repeater. 

5. A method as in claim 3, wherein the cost of a repeater is a value based on 
a predicted cost or speed of transmission between the repeater and a client.in the group. 

6. A method as iri claim 1 wherein the' particular resource itself contains at 
least one other resource identifier of at least one other resource, the method fiarther 
comprising: 

rewriting the particular resource to replace at least some of the resource 
identifiers contained therein with modified resource identifiers designating a repeater 
instead of the origin server. 

7. A method as in claim 6 wherein the rewriting is performed by one of the 
repeater, the reflector or another repeater. " * ' - 

8- A method of processing resource requests in a computer network, the 
method comprising, 
Q by a client: 
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(A) making a request for a particular resource from an origin server, 
the request including a resource identifier for the pardcular 



resource; 



^li) by a reflector. ■ .• ' : •. • . ' : 

(B) intercepting die request from .t^e client to die origin server. 
. . . . . (Q .. determining whedie^ to refle*;! di^request to a repeater, 

, . .. . .,(py.:: when the .reflectpr.deteriTU reject die request. 

, : . : : forwardingdie request to t^e origiii server, odierwise 
pi) selecdng a repeater to process the request; 
. P2) providing to th? dient a modified resource identifier 
. , ; designating the repeater. , 

■ . .. 9. . A mediod as in claim 8, further comprising, when die reflector 
determines to reflect die request. . - . . ^. S r : 

(iii) by the client: 

.(E) . .. recdying-thc mpdifi^^^.r^sour^^ ftom the reflector; and 

. ... ,-! : . ,^ri> . .(F),. . making a request for the pardcular resource from die repeatti 

designated in die modified resource identifier; 

(iv) by the repeater. . - . .. 

(G) . receiving die request from die client; and 

(H) returning die requested respurce to the diei^t. 

. : : r ■ 10. A medipd as in claim 8 whereip die reflector detennines whether to 
reflect a request by comparing the resource identifier widi regular expression patterns of 
repeatable resources. 
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11. A method as in claim 8, wherein the reflector has a threshold aggregate 
information rate (TAIR) associated therewith, and wherein the determining of whether 
to reflect the request to a repeater comprises: 

determining whether the TAIR of die reflector is exceeded by a measured 
aggregate information rate (MAIR) for the reflector, wherein the reflector determines 
not to reflect the request when the MAIR exceeds the TAIR for "die reflector. 

12. A method as in claim 8, wherein the reflector has a threshold aggregate 
information rate (TAIR) associated dierewitli, and wherein the detennining of whedier 
to reflect the request to a repeater comprises: 

probabilistically determining whether the TAIR of die reflector is exceeded by a 
measured aggregate information rate (MAIR) for die reflector, wherein die reflector 
determines not to reflect the request as an exponential function of the difference 
between the MAIR and die TAIR. 

13. A method as in any of claims 11-12, wherein die MAIR is obtained from 
repeaters according to the rate at which they have transmitted data on behalf of die 
reflector during a ^ven time interval. 

14. A method as in any one of claims 1-12 wherein the network is die 
Internet and wherein the resource identifier is a uniform resource locator (URL) for 
designating resources on the Internet, ajid wherein die modified resource identifier is a 
URL designating die repeater and indicating die reflector or origin ser\^er, and wherein 
the modified resource identifier is provided to the client using a REDIRECT message, 

15. In a computer network wherein clients request resources from origin 
servers, a method comprising: 

providing at least one repeater; 
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providing reflectors at some of the origin servers, each reflector intercepting 
client resource requests made to its respective origin server; and 

each reflector selectively redirecting client resource requests for certain resources 
to one of the repeaters. 



16. A method as in claim 15 further comprising, by repeaters in the network: 
servicing redirected client resource requests; and 
selectively maintaining , copies of requested resources, 

whereby resources corresponding to redirected resource requests are selectively 
migrated from their origin servers to one or more repeaters. 



17. A computer network comprising: 
, v .a plurality of origin servers, at least some of the origin servers having reflectors 
associated therewith; 

a plurality of repeaters; and 
a plurality of clients, 

wherein each reflector is adapted to intercept resource requests made to its ^ 
respective origin server and to selectively redirect the resource requests to a dynamically 
selected repeater. 



18. In a computer network wherein clients request resources from origin 
servers, a reflector mechanism associated with an origin server, the reflector mechanism 
comprising: 

... means for intercepting a resource request made by client of an origin server; 
means for analyzing the resource request to determine whether to service the 
request locally at the origin server; 

means for determining a best repeater in the network to service the request when 
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the analyzing means determines that the request should not be serviced locally; and 
means for redirecting the client to the best repeater. 

19. A reflector mechanism as in claim 18 wherein the network is partitioned 
into groups and the means for determining the best repeater comprises: 

means for determining which group the client is in; 

means for selecting, from a plurality of repeaters in the network, a set of 
repeaters having a lowest cost relative to the group the client is in; and 

means for selecting as the best repeater a member of the set of repeaters. 

20. A reflector mechanism as in claim 19, wherein the cost of a repeater is a 
value based on a predicted cost or speed of transmission between the repeater and a 
client in the group. 

21. A mechanism as in claim 19, wherein the cost of a repeater is a value 
based on that repeaters current load and a maximum load for that repeater. 

22. A reflector as in claim 16 wherein the resource itself contains resource 
identifiers, the reflector further comprising: 

means for rewriting the resource to replace at least some of the resource 
identifiers contained therein with modified resource identifiers designating the repeater 
instead of the origin server. 

23. In a computer network wherein clients request resources from origin 
servers, a repeater mechanism comprising: 

means for receiving a resource request from a client; 

means for determining whether the resource is available locally; 

means for, when it is determined that the resource is not available locally. 
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means for providing die resource to the client. 

24. A reflector as in claim 18 wherein the resource itself contains resource 
identifiers, the repeater further comprising: 

means for rewriting the resource to replace at least some of the resource 

identifiers contained therein with modified resource identifiers designating the repeater 

■ - ■ - ....... . _ ,^ t ■ . . - * - , j^.. . . , ■ , 

instead of the origin server. 
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